Skip to content

Support encrypted VACUUM INTO destinations via file URIs#6107

Open
tjthiagocosta wants to merge 4 commits intotursodatabase:mainfrom
tjthiagocosta:fix/vacuum-into-uri
Open

Support encrypted VACUUM INTO destinations via file URIs#6107
tjthiagocosta wants to merge 4 commits intotursodatabase:mainfrom
tjthiagocosta:fix/vacuum-into-uri

Conversation

@tjthiagocosta
Copy link

Description

Allow VACUUM INTO destinations to be specified as file: URIs with
cipher and hexkey query parameters, enabling encrypt, re-encrypt,
and decrypt workflows in a single statement.

Also fixes URI-target edge cases that surfaced during implementation:
explicit get_flags() matching for all OpenMode variants, and
centralized encryption-option parsing via OpenOptions::get_encryption_opts().

Test plan

  • dbhash parity between source and encrypted/re-encrypted/decrypted destinations
  • SQLite 3.51 open-mode parity (14 seed × target combinations)
  • mode=memory URI targets
  • Encryption-option validation (cipher without hexkey, hexkey without cipher)

AI usage

GPT-5.4 and Claude Opus 4.6 assisted with implementation, review, and testing.

Fixes #5852

Extract the duplicated cipher/hexkey validation logic from
Connection::from_uri and from_uri_attached into a shared
OpenOptions::get_encryption_opts() method.

Replace the catch-all `_ => OpenFlags::default()` in get_flags() with
explicit arms for ReadWrite and Memory, making the match exhaustive over
all OpenMode variants.
Teach VACUUM INTO to accept file: URIs as the destination path,
enabling mode=, modeof=, cipher=, and hexkey= query parameters.

This allows encrypting, re-encrypting, or decrypting a database in a
single VACUUM INTO statement via URI parameters.
Copy link

@turso-bot turso-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review @LeMikaelF

@penberg
Copy link
Collaborator

penberg commented Mar 23, 2026

@avinassh probably wants to have a look at this too

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends VACUUM INTO to accept file: URI destinations with cipher and hexkey query parameters, enabling encrypt/re-encrypt/decrypt workflows while also tightening SQLite URI open-mode flag handling and centralizing encryption-option parsing.

Changes:

  • Add URI target handling to VACUUM INTO, including encryption parameter support (cipher, hexkey) and open-mode parity behaviors (including mode=memory).
  • Centralize URI encryption-option validation via OpenOptions::get_encryption_opts() and make get_flags() explicitly map all OpenMode variants.
  • Extend dbhash tooling/tests to open and hash encrypted databases for parity assertions.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tools/dbhash/src/lib.rs Adds hash_database_with_encryption() to open/hash encrypted DBs with provided opts.
tests/integration/query_processing/test_vacuum.rs Adds integration coverage for URI targets (mode parity, memory mode) and encrypted VACUUM INTO workflows.
core/vdbe/execute.rs Implements VACUUM INTO URI classification + destination opening logic + destination encryption setup.
core/util.rs Updates URI mode→flag mapping and adds OpenOptions::get_encryption_opts() validation helper + tests.
core/lib.rs Adds Database::prepare_new_database_encryption() to preconfigure encryption for new DB creation paths.
core/connection.rs Reuses OpenOptions::get_encryption_opts() in URI open paths (deduplicates validation logic).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1050 to 1056
// Map SQLite URI modes to the corresponding open flags.
Ok(match self.mode {
OpenMode::ReadWriteCreate => OpenFlags::Create,
OpenMode::ReadOnly => OpenFlags::ReadOnly,
_ => OpenFlags::default(),
OpenMode::ReadWrite => OpenFlags::None,
OpenMode::Memory => OpenFlags::Create,
})
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_flags() now maps mode=rw to OpenFlags::None. When the parsed URI path is :memory: (e.g. file::memory:?mode=rw), Connection::from_uri() will still take the in-memory branch but then pass OpenFlags::None into MemoryIO::open_file, which errors with NotFound because Create is not set. Consider special-casing self.path == MEMORY_PATH inside get_flags() (or in the from_uri memory-path branch) so :memory: targets always use OpenFlags::Create regardless of mode, to avoid this regression and keep file::memory: behavior consistent.

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! file::memory:?mode=rw would now regress because the in-memory branch gets OpenFlags::None. I’ll special-case MEMORY_PATH in get_flags() and add a regression test for that URI.

Comment on lines +1951 to +1954
let source_uri = format!(
"file:{dest_path_str}?cipher={}&hexkey={}",
source_encryption_opts.cipher, source_encryption_opts.hexkey
);
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source_uri variable is constructed using dest_path_str and points at the destination DB (with the source encryption params). Renaming it to something like dest_uri_with_source_params would make the intent clearer and avoid confusion when reading the test.

Copilot uses AI. Check for mistakes.
assert_eq!(rows, vec![("ok".to_string(),)]);

Ok(())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this test supposed to check?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a regression test for the URI open path, specifically to verify that \file::memory:?mode=memory still takes the in-memory branch and behaves like an in-memory database after the URI flag refactor. I’ll keep it for that coverage unless you’d prefer it removed from this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, we can keep it

.exists()
.then(|| std::fs::read(&turso_dest))
.transpose()?;
let turso_literal = sqlite_vacuum_literal(&turso_dest, case.target);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the destination path right, why it is called turso_literal

Copy link
Author

@tjthiagocosta tjthiagocosta Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable holds the target string passed to VACUUM INTO '...' (either a plain path or a file URI)
I'll rename it to vacuum_target.

ExistingDb,
}

fn sqlite_vacuum_literal(path: &Path, target: VacuumIntoTargetLiteral) -> String {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this has a sqlite_ prefix

Comment on lines +1831 to +1832
let source_hash = hash_database_path_with_encryption(&turso_source, None)?;
let dest_hash = hash_database_path_with_encryption(&turso_dest, None)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use the hash_database

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll simplify those calls.

}

Ok(())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good test matrix

}

#[turso_macros::test]
fn test_vacuum_into_uri_encrypts_plain_destination(tmp_db: TempDatabase) -> anyhow::Result<()> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 'plain' here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

‘plain’ refers to the source database being plaintext in that test. The current name is misleading, I’ll rename it to test_vacuum_into_uri_encrypts_plaintext_source

Comment on lines +1948 to +1952
"file:{dest_path_str}?cipher={}&hexkey={}",
dest_encryption_opts.cipher, dest_encryption_opts.hexkey
);
let source_uri = format!(
"file:{dest_path_str}?cipher={}&hexkey={}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are both using same path 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you want is dest_uri_with_source_encryption_params

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both URIs point to the same destination file. The second URI reopens that destination with the source encryption params to verify the vacuumed output no longer opens with the old key/cipher.

dest_uri_with_source_encryption_params is a clearer name. I’ll rename it accordingly.

text.replace('\'', "''")
}

fn vacuum_into_literal(conn: &Arc<Connection>, dest_literal: &str) -> anyhow::Result<()> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why _literal

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It refers to "string literal", but after review, the helper isn't really pulling its weight;
it only does two things: wraps escape_sqlite_string_literal(...) and calls conn.execute(...). It has 4 call sites, so it's better to remove it and inline the calls

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I suggest you to go over the names / fns and rename them if needed

The get_flags refactor mapped OpenMode::ReadWrite to OpenFlags::None, which broke file::memory:?mode=rw because MemoryIO::open_file requires the Create flag. Force Create in the from_uri memory branch since in-memory databases always need creation regardless of mode.
…pers

Add a regression test for file::memory:?mode=rw and simplify the VACUUM URI test helpers and names so the URI mode coverage is easier to follow.
Comment on lines +13605 to +13608
if uri_opts.path == MEMORY_PATH || matches!(uri_opts.mode, OpenMode::Memory) {
let io: Arc<dyn crate::IO> = Arc::new(crate::MemoryIO::new());
return crate::Database::open_file_with_flags(io, MEMORY_PATH, flags, dest_opts, None);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this allows vacuum into with :memory:, so lets add a test which verifies that it works as expected

Comment on lines +13615 to +13617
if metadata.is_file() && metadata.len() == 0 && !flags.contains(OpenFlags::ReadOnly) {
std::fs::remove_file(&uri_opts.path)
.map_err(|e| crate::error::io_error(e, "remove_file"))?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think we should ever remove / delete files. instead, raise a proper error and ask user to delete the file

Comment on lines -13656 to +13712
)));
}

// make sure to create destination database with same experimental features as source
// Always use PlatformIO for the destination file, even if source is in-memory.
// This ensures VACUUM INTO actually writes to disk.
let io: Arc<dyn crate::IO> = Arc::new(crate::io::PlatformIO::new()?);
// Plain filename destinations keep using PlatformIO so VACUUM INTO writes to disk
// even when the source database is in-memory.
let blocking_io: Arc<dyn crate::IO> = Arc::new(crate::io::PlatformIO::new()?);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the comment edited?

and also why it is renamed to blocking_io?

Comment on lines +13768 to +13770
let preserve_source_reserved_space =
program.connection.get_encryption_cipher_mode().is_none()
&& dest_encryption_opts.is_none();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn;t this confusing? it reads like we are preserving reserved space if encryption is not enabled

"PRAGMA hexkey = '{}'",
escape_sql_string_literal(&dest_encryption_opts.hexkey)
))?;
dest_conn.set_reserved_bytes(cipher_mode.metadata_size() as u8)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use required_reserved_bytes fn even though it is same, for readability

let perms = std::fs::metadata(modeof).map_err(|e| crate::error::io_error(e, "metadata"))?;
std::fs::set_permissions(&uri_opts.path, perms.permissions())
.map_err(|e| crate::error::io_error(e, "set_permissions"))?;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd verify if this is actually required... because when we open the connection, i thought we set these perms too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support VACUUM INTO to create an encrypted db

4 participants